Uniquely decodable n-gram embeddings
نویسنده
چکیده
We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule N . We classify all ∈ N that are valid images of strings under such embeddings, as well as all whose inverse image consists of exactly 1 string (we call such uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language. © 2004 Elsevier B.V. All rights reserved.
منابع مشابه
On the ratio of prefix codes to all uniquely decodable codes with a given length distribution
We investigate the ratio ρn,L of prefix codes to all uniquely decodable codes over an n-letter alphabet and with length distribution L. For any integers n ≥ 2 and m ≥ 1, we construct a lower bound and an upper bound for infL ρn,L, the infimum taken over all sequences L of length m for which the set of uniquely decodable codes with length distribution L is non-empty. As a result, we obtain that ...
متن کاملMultimodal Word Distributions
Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic i...
متن کاملMixed Membership Word Embeddings for Computational Social Science
Word embeddings improve the performance of NLP systems by revealing the hidden structural relationships between words. These models have recently risen in popularity due to the performance of scalable algorithms trained in the big data setting. Despite their success, word embeddings have seen very little use in computational social science NLP tasks, presumably due to their reliance on big data...
متن کاملOn Embeddings of $\ell_1^k$ from Locally Decodable Codes
We show that any q-query locally decodable code (LDC) gives a copy of l 1 with small distortion in the Banach space of q-linear forms on lNp1 ×· · ·× lNpq , provided 1/p1+ · · ·+1/pq ≤ 1 and where k, N , and the distortion are simple functions of the code parameters. We exhibit the copy of l 1 by constructing a basis for it directly from “smooth” LDC decoders. Based on this, we give alternative...
متن کاملOn the set of uniquely decodable codes with a given sequence of code word lengths
For every natural number n ≥ 2 and every finite sequence L of natural numbers, we consider the set UDn(L) of all uniquely decodable codes over an n-letter alphabet with the sequence L as the sequence of code word lengths, as well as its subsets PRn(L) and FDn(L) consisting of, respectively, the prefix codes and the codes with finite delay. We derive the estimation for the quotient |UDn(L)|/|PRn...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 329 شماره
صفحات -
تاریخ انتشار 2004